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ABSTRACT 

Tk«  M«  Parallel  Prognaaiag  Eaviroe- 
asat  is  s  graphica-bseed,  iatenctive  systaa  for 
Mioaiii  tho  CoaftguraMe,  Highly  Parallel 
(CHIP)  Computer  Desigaad  to  support  aoarty 
atf  aspects  of  parallel  prognaaiag  ia  ooe 
iatagratad  systaa,  Poker  has  boat  iaplsa sated 
a  a  (-33.000  Mm)  C  propaa  oa  the  VAX 
1V7W  wader  UNIX,  ft  provtda  a  aaaber  at 
aoeel  features  iadadiag  graphics  prognaaiag 
of  parallel  processor  coaaaaicsrioa. 


Ahhoogh  aach  soqueatial  prograamiag  caa  bo 
sccaaplishad  with  ooly  the  support  of  a  prognaaiag 

gwyaBadaT^too  coaplaa  to  be  dooe  w5T each  rudi- 

awotary  facilities  alooe.  The  Poker  Systaa  is  aa  ia  terse- 
thro  ingraaaBiag  eaviroaaoat  to  sopport  the 
Coo hga table.  Highly  PanlM  (CHIT)  Cosapatar  [1].  The 
Mar  Systea  is  aot  itself  a  paraMal  prograa,  bar  rather 
it  is  a  'froot  cad*,  iraplsa sated  oa  the  VAX/710  aodar 
UNIX.  It  is  a  froot  aad  to  a  praprasorypa  earsiaa  of  the 
CMP  hardwaro.  called  the  Priaglo,  which  is  a  U  proces¬ 
sor  poraflal  rorapoctr  taelaciag  the  CHIP  [2].  la  addl- 
tioa.  Poker  is  a  froot  ood  for  a  c  caplets  software  eon- 


For  example.  Figaro  1  gives  aa  algoritha  that  uaos  a 
bleary  tree  as  the  coaaunicatioe  graph.  The  algoritha 
flads  the  ■  annum  of  a  sat  of  euabert  (stored  ooe  par 
procca  ia  a  local  variable  called  *val*)  aad  thea  multi- 
pitos  aach  auaber  by  the  aaxiaea.  The  maximum  is 
fooad  by  ’hooting*  the  largest  vales  ia  each  subtree  to 
the  root  of  that  subtree.  Thea  the  global  maximum  is 
broadcast  back  through  the  tree  where  aach  process 
multiplies  it  liases  ia  local  *val*  Notice  that  although 
there  an  flfteea  procotaas  ia  the  tree,  there  are  ooly 
three  rypes  of  processes  used. 

The  coaversioa  of  this  algoritha  to  rua  oa  a  CHIP 
rnmpsfsr.  ia.,  the  prognaaiag,  is  straight  forward.*  It 
iavolva 

(a)  aaboddiag  the  coaauakatioa  graph  iato  the 

twitch  lattice. 

(b)  programming  the  process  typa  ia  a  soqaeatial 
prognaaiag  laaguago, 

(c)  asdgaiag  ooe  of  the  procaa  typa  to  each  pro- 

(d)  aaaiag  tho  data  path  ports,  aad 

(a)  coapiliag.  aasaabUag,  coordinating.  aad  load- 
iag  the  prograa. 


Wo  coasidar  aach  of  thea  activities  ia  tura. 

Eabeddiag  tho  coaauaicatioa  graph  iato  the 
switch  lattice  reqoiiea  thtt  ere  prograa  tho  switches  of 
tho  lattice  so  that  tho  processors  have  a  topology  that 
Batches  (or  is  a  super  sac  of)  the  topology  of  the  co» 
aoaicatioo  graph.  This  aaboddiag  opantioo  is  dOM 
graphically  (rather  thaa  symbolically)  ia  tho  Mtar 
Systaa  using  the  Switch  Settings  mode.  Figaro  2  illus¬ 
trates  a  particular  aaboddiag  of  tho  gfteea  aoda  biaary 
tree  iato  tho  lattice.  Processor  (U)  is  the  root  of  tho 
procaa  or  tree,  processor  (1.1)  is  a  loaf,  aad  processor 
(14)  is  uausod. 

Next  we  prograa  tho  three  procaa  typa  ia  a 
sequential  laaguage,  XX.  Each  procaa  is  viewed  a  a 
procedure  with  (optioaal)  pinasters  aad  local  vari¬ 
ables.  Ia  additioa  to  tho  usual  dadancioas  wo  amt 
specify  tho  parr  inner,  syabolie  asaa  used  by  a  procea 
to  refer  to  other  processa  with  which  it  coaanaicaws. 
Figure  3  thews  the  XX  cods  for  tho  three  procaa  typos. 
Ia  tho  prognat  tho  tyabol  ’<•*  is  osod  for 
iaput/output;  aaigaiag  to  a  port  aeon,  04.,  PARENT 
<•  val,  caeca  output,  aad  cesigaiag  froa  a  port  aaaa, 
eg, smi  <•  PARENT, causa  iaput. 

Tho  ooostructioa  of  tho  procasar  true  ia  tho  switch 
lattice  to  aacch  the  coanaaicatioos  papi  ghea  aa 
kaplicit  sseoriatioa  betweea  the  processes  of  tho  algo¬ 
ritha  aad  tho  processors.  Wo  aafco  this  rolatioaship 
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talk*  by  UMgua|  proem  ohm*  to  the  appropriate 
pntwen  using  the  Code  Name  aodt  of  tho  Mn 
System.  Figure  4  gives  tho  remit. 

Non,  tho  port  tna  ■ootioood  to  «och  proem 
mn  bo  n«on trod  with  o  iporiic  don  path.  Each  pro* 
conor  baa  eight  porta  cortenpoadiag  to  tho  con  pan 
poiata.  Oaly  thoao  cooaactod  by  a*  active  data  path  to 
aaothar  PS  aood  bo  aamd.  This  activity  ia  porformd 
uiiag  tho  Pott  Naana  node  of  Poker.  Figure  5  hova 
the  reault  of  oaniag  tho  porta. 

The  algorithn  ia  now  prograaaaod.  Non,  each  pro¬ 
em  typo  aaeotiooed  ia  tho  Code  Nanea  apeciAcatioo  is 
cooapiled  iato  assembly  code.  Tho  aaambiy  code  ia  thea 
'coordinated.*  ia..  nodifled  so  that  the  CHiP  Conputer 
caa  raa  it  syachroaoesiy.  The  eoordiaated  programs  are 
assembled  to  produce  processor  object  code.  Tho  iater- 
coaaectioa  structure  is  'compiled*  to  produce  seritch 
object  code.  Tho  object  codes  are  loaded  iato  tho 
machiae  aad  executed. 

Oaacrfptiea  ef  the  Poker  Cavireanaat 

la  tho  Int  sectioo  wo  used  tho  Poker  Programmiag 
EaviroaaMot  to  embed  graphs,  to  deftae  processes,  to 
assiga  processes  to  proceaaors,  aad  to  doeiaro  port 
oataes.  Tho  discussion  implied  (ho  existence  of  eortaia 
facilities  ia  the  Poker  System.  Now  we  give  a  more  com¬ 
plete  description  of  those  facilities. 

The  Poker  System  is  aa  interactive  programming 
environment  that  uses  two  displays:  Tbo  primary  display 
is  a  high  resolution  (1024  x  Kl  poet)  bit-mapped 
display,  aad  the  secondary  display  is  a  cooveatioaal 
character  display.  The  two  displays  are  used  to  increase 
tho  amount  of  iaforentioa  available  to  tho  programmer. 
Most  activity  takes  place  oa  tho  primary  display;  XX 
programming  is  usually  done  oa  tho  secondary  display. 

Tho  primary  display  has  tho  form  illustrated  ia  Fig¬ 
ures  2-3.  Tbo  bottom  square  regioa,  called  the /AM.  is 
whore  most  of  tho  programmiag  activity  takes  place. 
Tho  told  always  displays  some  schematic  represeatatioa 
of  tho  two  dimeasioaal  array  of  processors  beiag  pro¬ 
grammed.  Tho  exact  form  of  tho  repteseatstioa  changes 
depending  oa  whether  tbo  programmer  is  performing  a 
graph  embedding,  a  process  assignment,  a  port  dodara- 
tioe,  etc.  Since  tho  laid  is  not  always  large  enough  to 
show  tho  whole  schematic  reprematatloa,  a  map  of  that 
portion  being  displayed  is  given  ia  the  upper  left-hand 
comer.  Status  information,  diagnostics  aad  atiscailaao- 
out  data  an  given  ia  tho  upper  right  regioa  ef  dm 
display,  called  tho  chalkboard.  The  bottom  line  ef  foe 
chalkboard  is  tho  command  Urn,  used  for  specifying  tbo 
few  textual  commands*  required  by  Faker,  sack  as  read¬ 
ing  library  lias. 

Tho  logical  tcreetera  of  tho  Foker  Bavitoameat  is 
shown  ia  Figure  6.  It  provides  aa  iatemated  sot  of  facil¬ 
ities  to 


o  assiga  processes  to  processors  (Code  Nest¬ 
ing). 

a  declare  port  nasMt  (Fort  Naming), 
a  compile,  coordinate,  assemble  sad  load 
(Coouaaad  Request), 

a  execute,  trace,  pock  aad  poke  (Trace 
Values). 

We  aow  describe  each  of  these  facilities  ia  detail. 
Architectural  definition.  Because  Foker  is  intended  to  be 
a  laboratory  tool  for  studying  CHiP  programmiag.  it  has 
been  designed  to  support  a  number  of  CHiP  family 
architectures.  Programs  caa  be  written  for  logical  CHiP 
machines  with  from  4  to  4096  processors.  All  of  these 
logical  machines  caa  be  emulated  using  a  software  emu¬ 
lator.  aad  one  family  member,  the  64  processor  version, 
will  be  able  to  bo  run  oa  a  hardware  emulator,  the  Prin¬ 
gle  (2),  when  it  is  completed.  Consequently,  the  pro¬ 
grammer  begins  using  Poker  by  specifying  tho  char  sc- 
(eristics  of  tho  underlying  logical  architecture.  These 
include  the  number  of  processing  dements  aad  the 
amount  of  routing  capability  needed  tor  tho  lattice  (cor¬ 
ridor  width  [1]).  Tho  default  paraaseters  are  those  that 
match  the  machine  deflaed  in  the  previous  session,  or.  it 
there  was  none,  then  tbo  parameters  of  tho  Pringle 
hardware. 

Graph  embedding.  The  Sold  of  the  primary  display 
shows  the  lattice  of  the  current  architecture,  as  illus¬ 
trated  in  Figure  1.  Tho  activity  is  largely  that  previously 
described;  tho  programmer  connects  the  processors 
(represented  as  boxes)  with  line  segments  to  dedne 
edges.  Graphics  primitives  based  on  cursor  keys  permit 
edges  to  bo  drawn  aad  erased.  Faculties  are  available 
for  following  graph  edges,  managing  the  display  (a*., 
cantering),  saving  embaddings.  reading  in  library  embed¬ 
dings.  etc. 

Programming  tbo  process  cede  segment.  The  XX 
(Dos  Equis)  sequential  programmiag  language  is  a  sim¬ 
ple  scalar  language  for  dadoing  processes.  The  language 
bas  four  data  types  (Boolean,  character,  integer  and 
real),  the  comaaoo  control  structures  (whBe,  for.  If- 
then  elm,  etc.),  vectors  aad  the  usual  supply  of  scalar 
arithmetic  aad  logical  operators,  la  addition  to  data 
type  declarations,  one  caa  also  declare  scalar  variables  to 
be  port  names,  procedure  paraaseters,  or  variables  to  bo 
traced,  la  put/output  is  performed  by  assigning  from  or 
to  a  port  name.  The  semantics  are  'dam  driven*  writes 
occur  immediately  and  reads  wait  on  the  arrival  of  data, 
if  necessary  XX  process  codas  are  generally  developed 
on  the  secondary  display  using  a  standard  editor. 

Process  assignment.  The  processors  are  assigned 
processes  using  a  field  display  on  tho  primary  terminal 
like  thorn  ia  Figaro  4.  Tho  programmer  enters  tho  onam 
of  the  process  procedure  oa  the  list  lino  of  the  proces¬ 
sor  box.  If  tho  procedure  bm  formal  poremeters,  then 
values  for  tho  actual  parameters  caa  be  entered  on  tho 
following  (four)  lines.  Faculties  arc  provided  for 
buffering  tho  contents  of  a  bos  aad  thea  automatically 
depositing  tbo  contents  of  tho  buffer  iato  processors  ia 
whole  regions  of  tbo  processor  array.  In  this  way  tho 
programmer  is  saved  from  manually  entering  repeated 
information  whoa  tbo  algorithm  exhibits  uniformity. 

Pm  declarations.  Tho  Md  of  tho  prtmaiy  display 
bos  tho  form  Ulnstratod  ia  Figaro  3.  Bach  processor  hm 
up  to  eight  incident  edges  as  a  remit  of  tho  pipk 
embedding,  and  it  hm  boon  assigned  a  process  which 


box  is  divided 


osiM  the  port  rtdiniiM.  The 


n  port 
nv  oort 
w  pore 
aw  pore 


no  pore 
o  pore 
so  port 
s  port 


The  programmer  eaten  tbs  bssms  used  by  tbs  assigned 
pneso  cods  into  tbs  window  for  tbst  sdgs.  Tbs  aaaa* 
ars  clipped  to  tbs  irst  6 vs  characters.  Facilities  ars  pro¬ 
vided  for  displaying  undipped  names  in  tbs  chalkboard, 
and  like  tbs  procets  assignment,  it  is  possible  to  buffer 
port  aarignaaents  and  deposit  them  automatically  it 
whole  regions  of  tbs  processor  array. 

Program  translation.  Tbs  preceding  facilities  pro* 
vide  a  means  of  specifying  the  elements  of  a  Poker  pro* 
gram.  They  are  then  converted  into  executable  form. 
Tbs  XX  compiler  converts  each  process  to  assembly 
cods.  The  coordinator  [3]  then  attempts  to  convert  the 
process  assigned  to  each  processor  into  a  form  that  per¬ 
mits  the  entire  pregram  to  run  with  synchronous  (in., 
not  data-driven)  execution.  (This  step  can  be  by-pasted 
and  the  processes  can  be  run  in  data-driven  form.]  If 
coordination  it  successful,  the  processors  may  all  have 
different  assembly  codes  associated  with  them,  la  any 
event  the  assembly  converts  the  assembler  code  to  object 
form.  The  connector  “compiles*  the  graphical  represen¬ 
tation  of  the  communication  graph  into  an  object  form. 
The  object  code  and  the  object  graph  as  well  an  the 
actual  parameter  values  are  loaded  into  the  emulator  (or 
the  Pringle). 

Emotion.  The  resulting  program  is  executed  and 
the  traced  variables  arc  displayed;  the  held  is  similar  to 
that  used  for  process  astigaaaeat.  The  execution  can 
proceed  for  a  given  number  of  stapa.  or  until  a  displayed 
value  changes.  When  the  execution  is  suspended,  any  of 
the  displayed  values  can  be  changed.  When  execution 
resumes  these  new  values  are  poked  back  into  the  pro 

Further  detail  about  the  Poker  Environment  can  be 
found  in  the  references  (44]. 
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Figure  1.  An  algorithm;  each  leaf  is  an  instance  of 
the  leaf  process,  the  root  is  an  instance  of 
the  root  procets  and  all  other  nodes  are  in¬ 
stances  of  the  ancestor  process. 
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Figure  2.  An  embedding  of  the  IS  node  binary  tree. 
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root  (to); 

L CHILD.  RCHILD-. 
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Figure  3.  Coda  for  tbu  three  process  types. 


□  □□□= 
□□□□ 
□□□□ 
□□□□U 


Figure  4.  AtogRRMat  of  process  u 
we  that  tie  eaae  *aa 
clipped  to  lee  characters. 


The  specification  of  the  port 
tbu  the  mines  have  beta  d 
first  five  characters. ' 


i;  acts 
to  the 


krciitNtur*  oof. 
CHiF  piraiuri 


Ifirdpn  l 

MoMing ;  | 

foittn 

temnfit 

IlWvUM 
?r*c#  vhieot 


Figure  6.  The  logical  structure  of  the  Poker  Eaviroe 
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